1 research outputs found

    Indirect Methods for Robot Skill Learning

    Get PDF
    Robot learning algorithms are appealing alternatives for acquiring rational robotic behaviors from data collected during the execution of tasks. Furthermore, most robot learning techniques are stated as isolated stages and focused on directly obtaining rational policies as a result of optimizing only performance measures of single tasks. However, formulating robotic skill acquisition processes in such a way have some disadvantages. For example, if the same skill has to be learned by different robots, independent learning processes should be carried out for acquiring exclusive policies for each robot. Similarly, if a robot has to learn diverse skills, the robot should acquire the policy for each task in separate learning processes, in a sequential order and commonly starting from scratch. In the same way, formulating the learning process in terms of only the performance measure, makes robots to unintentionally avoid situations that should not be repeated, but without any mechanism that captures the necessity of not repeating those wrong behaviors. In contrast, humans and other animals exploit their experience not only for improving the performance of the task they are currently executing, but for constructing indirectly multiple models to help them with that particular task and to generalize to new problems. Accordingly, the models and algorithms proposed in this thesis seek to be more data efficient and extract more information from the interaction data that is collected either from expert\u2019s demonstrations or the robot\u2019s own experience. The first approach encodes robotic skills with shared latent variable models, obtaining latent representations that can be transferred from one robot to others, therefore avoiding to learn the same task from scratch. The second approach learns complex rational policies by representing them as hierarchical models that can perform multiple concurrent tasks, and whose components are learned in the same learning process, instead of separate processes. Finally, the third approach uses the interaction data for learning two alternative and antagonistic policies that capture what to and not to do, and which influence the learning process in addition to the performance measure defined for the task
    corecore